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ABSTRACT. Proteases regulate various aspects of the life cycle in all organisms by cleaving specific peptide bonds. Their 
action is so central for biochemical processes that at least 2% of any known genome encodes for proteolytic enzymes. Here 
we show that selected proteases pairs, despite differences in oligomeric state, catalytic residues and fold, share a common 
structural organization of functionally relevant regions which are further shown to undergo similar concerted movements. The 
structural and dynamical similarities found pervasively across evolutionarily distant clans point to common mechanisms for 
peptide hydrolysis. 

I. INTRODUCTION 

Proteases (PR's hereafter) perform enzymatic cleavage of peptide bonds in an enormous variety of biological processes 1 
including cell growth, cell death, blood clotting, immune defense and secretion. Viruses and bacteria use PR's for their life 
cycle and for infection of host cells, rendering proteases key targets for antiviral and anti-bacterial intervention. PR's enzymatic 
action is accomplished by a wide repertoire of possible residues, Ser, Asp, Cys, Glu and Thr or even metal ions, giving rise to 
six different classes of enzymes. The enzymatic reaction is believed to involve in all cases a nuclephilic attack on a specific 
amide carbon belonging to the substrate main chain. The nucleophilic agent can be (a) the OH or the SH group of the namesake 
residues in Ser- Thr and Cys proteases; (b) a water molecule activated by the presence of an aspartic dyad or of a glutamate for 
Asp and Glu proteases; (c) a Zn-bound water molecule or OH group in metalloproteasesi^. 

The large variety of catalytic active sites is paralleled by significant sequence and structural diversity: The approximately 
2,000 proteases of known structure can, in fact, be assigned to as many as thirteen distinct folds 1 . Several attempts have been 
made to identify common features across the various protease folds and clans. So far, the only trait apparently shared by PR's 
is the fact that the peptide substrate in the catalytic cleft takes an extended /^-conformation-. Here, by employing a novel 
quantitative methodological framework we extend significantly previous investigations of PR relatedness. First, by using bioin- 
formatics tools we show that a previously unnoticed and statistically-significant structural correspondence exists among a dozen 
distinct protease clans. Such relatedness was previously pointed out only among the two known folds of cytoplasmatic aspartic 
proteases, namely pepsins and retropepsins 5 . Remarkably, extensive molecular dynamics simulations 6,7 revealed qualitatively- 
similar large scale movements for these Asp PR folds. Prompted by this fact we next carry out a systematic investigation 
of common functional dynamics in all pairs of structurally-related PR's. This step is accomplished within a novel framework, 
based on coarse-grained elastic network models^i^iiiii^ii^, which is straightforwardly transferable to other enzymatic superfam- 
ilies. Through this effective quantitative strategy we unveil the unsuspected and pervasive similarity of large-scale dynamical 
fluctuations that accompany concerted rearrangements for many, albeit not all, pairs of PR folds. The extensive comparison 
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of structural and dynamical features across the entire set of PR folds suggests that several PR's besides Asp proteases share 
common conformational fluctuations impacting on their biological function. 

II. METHODS 

Structural bioinformatics. A set of reference structures of PR's common folds 4 were selected using criteria of minimal 
sequence and structural redundancy. To this purpose the set of 1,928 presently-determined PR's structures, comprising 13 major 
folds, was intersected with the PDBselect 14 list of structurally-resolved proteins with sequence identity smaller than 25%, i.e. 
below the twilight zone of structural similarity 15 . This lead to a set of 69 structures, covering all seven common folds. For 
a comprehensive coverage of PR structural diversity we subdivided the structures according to the CATH criteria for class, 
architecture and topology 16 . For all common folds, A-G, several structures shared the same CATH labelling. For each of these 
groups we retained the entry with most complete PDB structure and, whenever available, in complex with a ligand. For the 
uncommon folds (i.e. folds represented by one or very few non-redundant PDB structures) we used the same representatives 
previously identified by Tyndall et al. A . The complete set of representatives is shown in Figure^ 

Common structural traits were next sought with the DALI algorithm 17 in the 136 disctinct pairs of our representatives. DALI 
identifies blocks of residues having similar inter-residue distances. The consistency of the pairwise residue distances in two 
matching regions (based on the three-dimensional structure of the main chain with no sequence information) is measured by 
means of a knowledge-based score, a. The optimal alignment returned by DALI is the one maximising the score, <J pu and 
can comprise several distinct blocks. The order in which matching blocks appear in one protein is not necessarily the same in 
the partner one and the sequence directionality in two corresponding blocks may be reversed. These features endow DALI with 
considerable flexibility for identifying regions with common structural organization. The statistical relevance of the optimal 
DALI score a op t is quantified by the standard Z-score = , where a ave and Aa aV e are, respectively, the average 

score and dispersion expected for structurally-unrelated proteins of length equal to the aligned ones. Assuming that probability 
distribution of a is approximately Gaussian, one has that alignments with Z-score greater than 2 probability ought to have a 
probability smaller than 2 % to be generated by chance 17 . Finally, the oligomeric state of each active unit was fully taken into 
account in the structural alignments by merging all the polypeptide chains found in the biological unit deposited in the PDB and 
by running DALI on the "merged" chains. Accordingly, the resulting optimal alignments turned out to be, in general, different 
from that deposited in the DALI/DCCP database in which each chain is considered separately 18 . 

Protein large scale motions. Prompted by Ref. 6 , in which a similarity of large-scale motions was established between the two 
known folds of cytoplasmatic Asp proteases, we aimed at establishing the consistency of the slow modes of each aligned pair of 
PR representatives X and Y . 

In several contexts these concerted rearrangements have been shown to be conditioned, and hence well-described, by the 
slowest modes of fluctuation around an enzyme's average structurai2i22i2i. Well-established procedure exist for calculating such 
modes in MD simulation contexts (i.e. by principal components analysis of the covariance matrix) 22,23 . The reliable identification 
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FIG. 1: Common (A-G) and uncommon PR folds (H-M). PDB codes and length of representatives are as follows: A ler8 (330), B InhO 
(198), C luk4A (302), Di lavp (204), D 2 lga6 (369), D 3 lioi (208), E ljq7A (210), Fi lk3bA (119), F 2 lme4 (215), d lkuf (201), G 2 8cpa 
(307), H lpmaA (221), I ln6e (1023), J lqfs (710), K H78A (297), L lrr9A (182), M ls2k (199). 

of the essential spaces typically requires the monitoring of the system evolution over tens of nanoseconds 24 , entailing a very 
onerous computational expenditure for proteins of a few hundred amino acids. It is therefore apparent that such analysis cannot 
be carried out for each of the 17 PR representatives under consideration. We have hence resorted to a coarse-grained model, the 
/3-Gaussian network model 13 , which provides a reliable (by comparison against atomistic simulations) description of concerted 
large-scale rearrangements in proteins with a negligible computational expenditure. In this approach, the concerted motions are 
calculated within the quasi-harmonic approximation of the free energy T around a protein's native state (assumed to coincide 
with the crystallographic structure). Thus, a displacement from the native state 5R = {8f\, 5r2, 5tn} (fi being the 
displacement of Ca atom i) is associated to the change in free energy w ^SwF 5R, where F is an "interaction" matrix 
constructed from the knowledge of contacting C a and Cp centroids in the native state 13 and the | superscript indicates the 
transpose. The large scale motions of the system correspond to the eigenvectors of F having the smallest non-zero eigenvalues 13 . 

As we are interested only in the concerted motions of aligned regions of PR pairs, we subdivide the residues of each representa- 
tive X and Y in two sets according to whether they take part to the DALI-aligned regions (set A, characterized by displacements 
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5R A ) or not (set B, characterized by 8Kb)- Residues in set A are ordered so that amino acids in structural correspondence 
appear in the same order for the two proteins. AJF then reads: 




where [F^] is the interaction matrix within set a A [B] and G contains the pairwise couplings across the two sets. The prob- 
ability of occurrence of displacements 5R A and 5R B in thermal equilibrium is given by the Boltzmann distribution. Neglecting 
the normalization factor, it reads: 



utXT} xB v , AF. . SR A F A 6R A + 5R B F B 5R B + 2SR A GSR B , 

P(SR A ,6R B ) = exp(- — ) = cxp( £ 2^ & ) . (l) 

Since we focus only on the free energy change associated with residues in set A, we calculate the probability distribution for 
set A integrated over all displacements in set B. The integration can be evaluated analytically and yields^ 



P(6R A ) = exp(- 



AT A 
kT 



) = / d8R B P(6R A , 5R B ) oc exp 



r SR A (F A -GF^Gt) 6R A 
2kT 



(2) 



hence: 



AT A = - 6R A (F A - G F^Gt) 5R A (3) 

Thus, the eigenvectors associated with the smallest eigenvalues of (F^ — GF i3 1 G^) represent the integrated slow modes of 
the matching regions; the term "integrated" is used to stress the fact that the modes depend also on the non-matching ones via 
the contributions G and F^. The eigenvectors of F A , instead, will be termed "bare" slow modes since they neglect the presence 
of the non-matching regions. The comparison of the integrated and bare essential dynamical spaces is used here to investigate 
the influence of the non-matching regions over the dynamics of the matching ones. 

The eigenvectors of (F^ — G F J3 1 G^), calculated separately for proteins X and Y, can be directly compared component by 
component (we assume that X and Y are represented in the Cartesian coordinates set providing the optimal structural super- 
position of the DALI matching regions). To measure the agreement of the integrated dynamics of proteins X and Y we hence 
considered the root mean square inner product (RMSIP) of the top 10 slowest modes 10 and v\ w of (F^ — G F i3 1 G^) 26 , 



RMSIP(setA) = 




If a comparison is sought for the "bare" dynamics, the eigenvectors of F^ are used in place of those of (F^ — G F^G^). 
The value taken on by the RMSIP, ranging from (complete absence of correlation) to 1 (exact coincidence of the slow modes), 
is compared with a control RMSIP distribution to assess its statistical significance. The term of comparison is given by the 
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distribution of RMSIP values resulting by randomly choosing the residues in set A, that is for arbitrary choices of the blocks of 
corresponding residues in structures X and Y. Accordingly, we stochastically generated 100 "decoy" sets of matching residues 
in X and Y involving the same number of amino acids as the optimal DALI alignment of X and Y, Also the typical size 
of DALI matching blocks (10-15 residues) is respected in the control alignments. For each stochastic alignment we carried 
out numerically the dynamical integration described above and hence obtained the corresponding RMSIP value from equation 
(0}. By processing the results of the 100 decoy alignments we calculated the average value and dispersion of the control 
RMSIP distribution, (RMSIP) and ARMSIP. These quantities were used to define the dynamical Z-score: (RMSIP dali - 
(RMSIP) )/ARMSIP. In analogy to the structural Z-score, it provides a measure of how unlikely it is that the RMSIP of the 
DALI matching regions could have arisen by chance. 

The viability of this procedure for comparing the large-scale movements of the matching residues in two proteins was tested 
within the context of atomistic MD simulations in aqueous solution. In particular, the analysis was carried out on two trajectories 
of 10 and 20 ns previously obtained by us for HIV-1 PR and BACE, respectively^i. Since dynamical trajectories are available, 
it is not necessary to resort to the /3-Gaussian model for calculating the integrated essential dynamical spaces of set A. The latter 
are, in fact, calculated from the covariance matrix constructed for the matching regions alone (i.e. removing the roto-translation 
of the latter) 27 . The standard definition of covariance matrix is employed, i.e. the generic matrix element reads 

Cy.ctf = (K ■ Srj) t (5) 

where () t denotes the time average of the displacements (at equal times) of residues i and j corresponding to the Cartesian 
components a and (3. Within our quadratic approximation for the free energy, the principal components of the covariance ma- 
trices exactly correspond to the slow modes. For this reason we shall measure the dynamical accord in the atomistic simulations 
context by considering both the RMSIP calculated over the principal spaces of the matrices Cx, and Cy as well as from the 
similarity of corresponding entries in the two matrices. The dynamical RMSIP of the DALI matching regions obtained from this 
approach was found to be equal to 0.65, consistently with the high statistical significance of corresponding entries of the two 
normalised reduced covariance matrices^, C, (definition: CV, = J2 a ^'.""/i/Eq Cu,aa J2 a Cjjfip) as visible in Figure|2] 
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integrated cov. matrix entries for HIV-1 PR 



FIG. 2: Scatter plot of corresponding matrix elements of the integrated reduced normalised covariance matrices for BACE and HIV-1 PR 
obtained from MD simulations in explicit solvent. The linear correlation coefficient of the 14,000 distinct entries is 0.77. The non-parametric 
Kendall correlation coefficient is instead, r = 0.41 corresponding to an extremely large statistical significance (z ~ 70). 



HI. RESULTS AND DISCUSSION 



Structural alignment across PR's. 

(i) Identification of representatives. The set of 1,928 presently-known proteases was initially reduced to a collection of 69 
entries with minimal mutual sequence identity. The resulting structures, which covered the whole range of common folds, were 
then subdivided according to the CATH criteria for class, architecture and topology 16 . For each CATH entry, we then selected 
the most complete structure and, whenever available, one in complex with a peptide mimic substrate. For the six uncommon 
folds we retained, instead, the representatives previously identified by Tyndall et alX. The 17 representatives are shown in 
Figure ^ Besides the major structural differences across folds it is interesting to notice that folds D, F and G possess a fair 
degree of internal structural heterogeneity at the "topology" level of the CATH classification scheme 16 and that only D3 and G2 
are exopeptidases. Except for representatives Fi and F2 and those of pepsins and retropepsins (folds A and B), all other reference 
structures belong to distinct clans according to the MEROPS classification 1 . Since this is indicative of a different evolutionary 
origin, any common property found persistently in members of Figure ^ arguably reflects a convergent evolutionary pressure. 

(ii) Alignment of pairs of representatives. Because of the major structural differences across PR's, global structural matches 
of the representatives were not attempted. Rather, we looked for partial structural alignments among all the representative pairs 
(136) using the DALI algorithm 17 . DALI identifies corresponding blocks of residues having similar inter-residue distances (with 
either direct or reversed sequence directionality), and provides a score function (Z-score), conveying the statistical significance 
of the alignment. Values of Z-score ~ 2 or larger, corresponds to alignments expected to have a probability of less than 2% to 
be generated by chance. The top 20 pairs having Z-score >^ 2 are shown in Table|I] 

Such alignments are typically constituted by several disconnected matching blocks with the same directionality. This is 
ilustrated in Figure|3]which portrays statistically -relevant alignments against representatives D3 (PDB entry 1IOI) and B (1NH0). 
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Fold from 


Fold from 


Length 


Seq. Id. 


RMSD 


DALI 


Dynamical 


Dynamical 


protein 1 


protein 2 




(%) 


(A) 


Z-score 


RMSIP 


Z-score 


A (Asp) 


B (Asp) 


168 


14 


3.7 


10.4 


0.71 


36.86 


I (Ser) 


J (Ser) 


257 


11 


5.6 


8.9 


0.70 


11.56 


D 3 (Cys) 


G 2 (Met) 


150 


7 


3.3 


8.8 


0.74 


10.56 


J (Ser) 


G 2 (Met) 


151 


6 


4.0 


5.2 


0.70 


11.85 


D 3 (Cys) 


J (Ser) 


101 


10 


3.0 


4.2 


0.70 


10.18 


K (Asp) 


Fi (Cys) 


84 


10 


3.9 


4.0 


0.65 


15.71 


D 2 (Ser) 


I (Ser) 


130 


8 


4.8 


3.8 


0.58 


5.57 


D 2 (Ser) 


D 3 (Cys) 


95 


6 


4.6 


3.3 


0.69 


8.70 


D 2 (Ser) 


Gi (Met) 


125 


6 


4.5 


3.2 


0.71 


12.07 


D 2 (Ser) 


J (Ser) 


134 


7 


3.9 


3.1 


0.72 


9.33 


D 3 (Cys) 


I (Ser) 


94 


10 


3.7 


2.9 


0.64 


6.84 


D 3 (Ser) 


Gi (Met) 


89 


8 


3.4 


2.7 


0.67 


7.34 


Gi (Met) 


G 2 (Met) 


104 


8 


4.0 


2.6 


0.65 


8.84 


d (Met) 


L (Ser) 


73 


1 


3.0 


2.3 


0.70 


9.37 


D 2 (Ser) 


L (Ser) 


103 


10 


4.9 


2.3 


0.69 


12.93 


D 2 (Ser) 


G 2 (Met) 


138 


9 


4.5 


2.1 


0.65 


6.55 


I (Ser) 


L (Ser) 


58 


9 


3.7 


2.0 


0.77 


9.34 


D 3 (Cys) 


L (Ser) 


65 


11 


3.8 


1.9 


0.73 


8.62 


F 2 (Cys) 


H (Ser) 


54 


7 


3.2 


1.8 


0.68 


7.15 


B (Asp) 


C (Ser) 


73 


7 


4.8 


1.8 


0.69 


9.87 



TABLE I: Top 20 structural alignments of pairs of representative proteases ranked according to the statistical significance (DALI Z-score). 
The fold (see Figure^ and chemical class of the pairs are provided in the first two columns. The total number of aligned residues is given 
in column 3 along with the sequence identity (Seq. Id.) and RMSD over the matching regions. Pairs of common folds are highlighted in 
boldface. The dynamical accord of the latter and the associated statistical significance are provided in the last two columns. 



The former was chosen owing to the large number of significant alignments to which it takes part, while the latter provides 
a structural support to the recent suggestion of evolutionary relatedness of eukariotic and viral Asp proteases 5 - 6 - 28 (first two 
structures in Figure^*. 

As visible in Figure[4]the partial structural alignments (that we stress are oblivious to sequence specificity) typically present 
a superposition of the PR catalytic residues. Even more notable is the active site correspondence found in PR's with different 
catalytic residues (see Figure0J) and c). The fact that alignments build around the active site implies a good consistency of the 
regions alignable against different PR's. This can be readily perceived in the pile-up diagrams of Figure|3] 

In contrast, the alignability of PR's with generic proteins is much poorer. This was established by considering the publicly- 
available DCCP listing of DALI alignments among several thousands of protein representatives 1417 . We considered all align- 
ments spanning more than 70 residues and involving at least one known protease. More than 80,000 such alignments were 
found which were ranked in terms of DALI Z-score. Within the 40,000 top alignments (Z-score greater than 6) we found that 
70 % of the matches involved protease pairs, not necessarily of the same class. In other words whenever a protease admits 
a structural alignment with high statistical significance, the partner protein is likely to be another protease. Since no sequence 
information is used by DALI, the high selectivity of matches involving PR's hints to a functional-basis for the observed structural 
correspondence. 
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FIG. 3: Pile-up of alignments involving (a) fold D3 and (b) fold B. The boxed panel is the linear representation of the secondary structure 
content of the reference protein (red: helix, blue: extended, yellow: loop/turn). Above the box: arrows indicate the location of the catalytic 
residues, thick [thin] segments indicate amino acids within 7 [10] A of the catalytic sites. For each aligned protein we show, below the box, 
the location of the matching residues and the corresponding secondary content. 



Large-scale dynamics across PR's. 



The extent and significance of the structural matches found here, spanning members of distinct protease clans , is suggestive of 
a biological selection criterion transcending the chemical determinants. An appealing possibility is the existence of an underlying 
unifying principle related to the necessity of proteolytic catalysis to rely on well-defined concerted functional movements. 

Whilst the presence of concerted motions in enzymes is we ll-established 6,7121 i22i^2i^iiSi^i^i^i^S, the relevance of 
such movements for catalysis is a subject of vivid debate. Several g roU ps&LA2iii22i22i2iiSiS2ii&2Si£i^ have argued that these 
movements could be a result of specific protein architectures aimed at preserving the rigidity of the active site region. In 
the specific case of Asp proteases, different lines of research suggest that conformational fluctuations may play a role for the 
function&iii^i and involve conserved structural features across the family 5 ' 6 ' 28 . 

Prompted by these suggestions, we extended the investigation of common large-scale dynamics to all pairs of PR representa- 
tives. To calculate the slowest modes (essential dynamical spaces 26 ) of each representative we have used a relatively accurate and 
computationally affordable coarse-grained approach, the /3-Gaussian network model 13 . This method was employed in a novel 
context (see Methods) which allows to describe the protein large-scale movements in the frame of reference of the matching 
regions. Since the dynamical influence of the non-matching ones is, nevertheless, taken into account we shall term the approach 
"integrated" to distinguish it from the "bare" description where the non-matching residues are entirely omitted. 

Two indexes were used to identify the degree of correlation between the slow motions of two representative pairs for both the 
integrated and bare case. The first one is the so-called RMSIP, which provide a quantitative estimate of the consistency of the 
10 slowest modes of the proteins (RMSIP=0 [1] corresponds to no [full] correlation). The second one is the dynamical Z-score 
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FIG. 4: Top structural alignments (according to Z-score) of common protease folds, see Table [I] (a) endothiapepsin (ASP) and HIV-1 
retropepsin (ASP) - folds A and B; (b) pyroglutamyl-peptidase I (Cys) and carboxypeptidase Al (Zn)- folds D3 and G2; (c) pyroglutamyl- 
peptidase I (Cys) and sedolisin (Ser) - folds D3 and D2. Catalytic residues are drawn as spheres. The thick backbone highlights the overlapping 
region. 

which, in analogy with the structural one, measures the statistical significance of the observed accord (by comparison against 
randomly-generated "DALI-like" alignments). 

The dynamical accord reported in Table[Qturns out not to capture a mere consistency of overall mobility, but reflects the close 
correspondence of the directionality of the slow modes at a residue-wise level. The inspection of the principal directions of 
the large-scale movements (see Figure [5jl indicates prominent rearrangements of active site surroundings (i.e. flaps, cleavage, 
and recognition sites) resulting in a distortion of the crevice accommodating the substrate. This is suggestive of a common 
dynamical selection operated by the necessity to recognise/process peptides in well-defined geometrical arrangements (such as 
the /^-extended one ubiquitously observed in bound PR substrate analogs 4 ). Consistently, the integrated dynamical movements 
found here for Asp PR's appeared to be directly related to functional dynamics. In fact, the difference vector describing the 
structural distortion between inactive and reactive conformations of HIV-1 PR 7 - 40 is mostly concentrated (91 % of the norm) on 
the regions that match with BACE. Furthermore, the top 10 slow modes of the matching regions are able to account for 71 % of 
the norm of the difference vector over the same set of residues. 
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(a) 



(b) 




FIG. 5: Dynamical overlap for the same PR pairs of Figure (a) endothiapepsin (ASP) and HIV-1 retropepsin (ASP) - folds A and B; (b) 
pyroglutamyl-peptidase I (Cys) and carboxypeptidase Al (Zn)- folds D3 and G2; (c) pyroglutamyl-peptidase I (Cys) and sedolisin (Ser) - 
folds D3 and D2. Red/pink and blue/cyan colors denote the dynamical and structural features of the aligned pairs. The top three essential 
dynamical spaces of the matching regions in the two proteins were considered. The directions of the 20 largest displacements of the best 
overlapping pair of modes are shown as arrows of equal length. 

Several conclusions can be drawn. First, and most importantly, the RMSIP values for pairs showing statistically-significant 
alignment are clustered around 0.7, which reflects an excellent degree of correlation^. In fact, this value exceeds by several 
times the one expected for random "DALI-like" alignments. Second, a highly significant structural alignment, e.g. Z-score > 4, 
implies a strong integrated dynamical correspondence, dynamical Z-score > 10, see Figure[6] This is indicative of a correlation 
between similar protein movements and structural similarity. However, no precise common trend exists between the structural 
and dynamical Z-scores, as visible in Figure [6] This may reflect the fact that the dynamical fluctuations play a different role 
in different members of the PR family. Third, the non-matching regions are not dynamically "neutral" , but are co-opted for 
establishing the dynamical correspondence of the matching ones. In fact, when the non-matching regions are entirely omitted 
from the coarse-grained dynamical analysis (see Methods), the corresponding dynamical accord decreases dramatically ("bare" 
case of Figure[6). 

Finally, all these conclusions are robust against the use of other common measures of dynamical consistency and statistical 
relevance (e.g. linear or Kendall's correlation of covariance matrices) or if free enzymes, i.e. with no bound substrate, are used. 
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FIG. 6: Trend of the Z-scores for structural (DALI) and dynamical alignments for the 20 PR pairs of Tabled The pairs are ranked according to 
the DALI Z-score. Pairs of common folds are highlighted in boldface. The dynamical Z-scores have been calculated using both the integrated 
approach and the bare one. The former accounts correctly for the dynamical influence of the non-matching regions, while the latter neglects it 
entirely (see Methods). 



Concluding remarks 



Large-scale motions, which certainly occur in enzymes^iiiii^Si^Si^LSiS^l^i^iS, have been increasingly suggested to play 
a role for enzymatic function for Asp pR' £ 6 'iii22^i whilst it has not emerged for other major PR classes, notably Ser PR's. In 
the latter case, it has been strongly suggested by many groups that electrostatics is crucial for the enzyme 41 ' 42 . It is therefore 
interesting to consider, the structural/dynamical alignments of common PR folds (highlighted in Table U and Figure [6j in the 
light of these previous observations. 

First, we notice that the Asp PR's (folds A and B) present the highest similarity of structural and dynamical features, providing 
further support to the functional relevance of conformational fluctuations for these enzymes. Second, any other fold exhibiting 
statistically-significant, yet far smaller, structural and dynamical scores involve a subfamily of Cys and Ser PR's, namely caspase- 
like and subtilisin-like, along with metallo-pro teases. As conformational fluctuations are expected not to be determinant for Ser 
PR functionality 41,42 , it is tempting to conclude that these concerted motions might not play a critical role for enzymatic catalysis 
in this subfamily. The question of why such high degree of structural and dynamical similarity exists in this subfamily emerges 
spontaneously. An appealing, yet highly speculative answer, is the fact that the common features have been selected to maintain 
the active site relatively rigid and therefore efficient for Ser PR catalysis or to bind and recognize the substrate 43 . The lowest 
ranking alignment involving Asp PR's in Table U and Figure |S] is between retropepsins (fold B) and the trypsin-like Ser PR 
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representative (fold C) which, could not be aligned with any other fold. 

In summary, across selected PR's with different folds and catalytic chemistry we observed a strong consistency of the es- 
sential dynamics around the active site. The intimate connection between the functional dynamics and enzymatic structure 12 
reverberates in strikingly-similar spatial organization of the regions surrounding the active site. This suggestes that evolutionary 
pressure may have resulted in a conservation across the family not only of the structural features but also of the dynamical ones. 
Considerable structural diversity is observed outside this region. Yet, this variability is not arbitrary but is co-opted to produce 
consistent large-scale dynamics of the functional region. In some specific cases, and when the dynamical and structural conser- 
vation has a high statistical significance, these results strongly suggest that the essential dynamical spaces have an important role 
for enzymatic catalysis. 

Acknowledgment. This work was supported by INFM - Democritos. We are indebted to Martino Bolognesi and Arthur Lesk 
for discussions and comments on the manuscript. 



A. J. Barrett, N. D. Rawlings, J. F. Woessner, and eds, Handbook of Proteolytic Enzymes; Elsevier, Amsterdam, second ed., 2004. 

L. Stryer, Biochemistry; Freeman W.H., New York, 4th ed., 1995. 

L. Vandeputte-Rutten and P. Gros, Curr. Op. Struc. Biol, 2002, 12, 704-708. 

J. D. Tyndall, T. Nail, and D. P. Fairlie, Chem. Rev., 2005, 105, 973-999. 

T. L. Blundell and N. Srinivasan, Proc. Natl. Acad. Sci. USA, 1996, 93, 14243-14248. 

M. Cascella, C. Micheletti, U. Rothlisberger, and P. Carloni, J. Am. Chem. Soc, 2005, 127, 3734-3742. 

S. Piana, P. Carloni, and M. Parrinello, /. Mol. Biol., 2002, 319, 567-583. 

M. M. Tirion, Phys. Rev. Lett., 1996, 77, 1905-1908. 

A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin, and I. Bahar, Biophys. J., 2001, 80, 505-515. 

T. Horiuchi and N. Go, Proteins, 1991, 10, 106-116. 

I. Bahar, A. R. Atilgan, and B. Erman, Fold. & Des., 1997, 2, 173-181. 

O. Keskin, R. L. Jernigan, and I. Bahar, Biophys. J., 2000, 7<S(4), 2093-2106. 

C. Micheletti, P. Carloni, and A. Maritan, Proteins, 2004, 55, 635-645. 

U. Hobohm and C. Sander, Prot. Sci., 1992, 2, 522. 

A. M. Lesk, Introduction to Protein Science: Architecture, Function and Genomics; Oxford University Press, UK, 2004. 

F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, A. Grant, D. Lee, A. Akpor, M. Maibaum, 
A. Harrison, T. Dallman, G. Reeves, I. Diboun, S. Addou, S. Lise, C. Johnston, A. Sillero, J. Thornton, and C. Orengo, Nucleic Acids 
Research, 2005, 33, D247-D25 1 . 

L. Holm and C. Sander, Science, 1996, 273, 595-603. 

L. Holm and C. Sander, Nucl. Acid Res., 1997, 25, 231-234. 

J. A. McCammon, B. R. Gelin, M. Karplus, and P. G. Wolynes, Nature, 1976, 262, 325-326. 

V. Alexandrov, U. Lehnert, N. Echols, D. Milburn, D. Engelman, and M. Gerstein, Protein Science, 2005, 14, 633-643. 
M. Delarue and Y. H. Sanejouand, J. Mol. Biol., 2002, 520(5), 1011-1024. 
A.E. Garcia, Phys. Rev. Lett., 1992, 68, 2696-2699. 

A. Amadei, A. B. M. Linssen, and H. J. C. Berendsen, Proteins, 1993, 17, 412-425. 

B. Hess, Phys. Rev. E, 2002, 65, 031910. 

K. Hinsen, A. J. Petrescu, S. Dellerue, M. C. Bellisent-Funel, and G. Kneller, Chem. Phys., 2000, 261, 25-37. 
A. Amadei, M. A. Ceruso, and A. Di Nola, Proteins, 1999, 36, 419-424. 

A. Pang, Y. Arinaminpathy, M. S. P. Sansom, and P. C. Biggin, Proteins: Structure, Function, and Bioinformatics, 2005, 61, 809-822. 

M. Neri, M. Cascella, and C. Micheletti, J. Phys. Cond. Mat., 2005, 17, 1581-1593. 

T. H. Rod, J. L. Radkiewicz, and C. L. Brooks, Proc. Natl. Acad. Sci. USA, 2003, 100, 6980-6985. 

R. M. Daniel, R. V. Dunn, J. L Finney, and J. C. Smith, Ann. Rev. Biophys. Biomol. Struct, 2003, 32, 69-92. 

P. K. Agarwal, S. R. Billeter, P. T. Rajagopalan, S. J. Benkovic, and S. Hammes-Schiffer, Proc. Natl. Acad. Sci. USA, 2002, 99(5), 2794- 
2799. 

A. Tousignant and J. N. Pelletier, Chem. Biol, 2004, 11, 1037-1042. 

G. M. Suel, S. W. Lockless, M. A. Wall, and R. Ranganathan, Nat. Str. Biol, 2003, 70(1), 59-69. 



13 



A. L. Perryman, J-H. Lin, and J. A. McCammon, Prot. Sci., 2004, 13, 1 108-1 123. 
J. Luo and T. Bruice, Proc. Natl. Acad. Sci. USA, 2004, 101, 13152-13156. 

E. Z. Eisenmesser, D. A. Bosco, M. Akke, and D. Kern, Science, 2002, 295, 1520 - 1523. 

I. Bahar, A.R. Atilgan, M.C. Demirel, and B. Erman, Phys. Rev. Lett., 1998, 80, 2733-2736. 

C. Micheletti, G.L. Lattanzi, and A. Maritan, J. Mol. Biol., 2002, 321, 909-921. 

M. H. M. Olsson, W. W. Parson, and A. Warshel, Chem. Rev., 2006, NA. 

S. Piana, P. Carloni, and U. Rothlisberger, Prot. Sci., 2002, 11, 2393-2402. 

A. Warshel, G. Naray-Szabo, F. Sussman, and J. K. Hwang, Biochemistry, 1989, 28, 3629-3637. 

T. Ishida and S. Kato, J. Am. Chem. Soc, 2003, 125, 12035-12048. 

A. V. Finkelstein, Protein physics; Academic press, Amsterdam, 2002. 

F. C. Bernstein, T. F. Koetzle, G. J. Williams, E. E. Meyer, M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. 
Biol, 1977, 112, 535-542. 



